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Abstract 

Formal and distributional semantic mod- 
els offer complementary benefits in mod- 
eling meaning. The categorical composi- 
tional distributional model of meaning of 
|Coecke et al. (2010| (abbreviated to DisCo- 
Cat in the title) combines aspects of both 
to provide a general framework in which 
meanings of words, obtained distributionally, 
are composed using methods from the logi- 
cal setting to form sentence meaning. Con- 
crete consequences of this general abstract 
setting and applications to empirical data are 
under active study ( [Grefenstette et al., 201 1[ 
[Grefenstette and Sadrzadeh, 201 l| l. In this pa- 
per, we extend this study by examining tran- 
sitive verbs, represented as matrices in a 
DisCoCat. We discuss three ways of con- 
structing such matrices, and evaluate each 
method in a disambiguation task developed by 
[Grefenstette and Sadrzadeh (20l"T| i. 

1 Background 

The categorical distributional compositional model 
of meaning of Coecke et al. (2010] ) combines 
the modularity of formal semantic models with 
the empirical nature of vector space models of 
lexical semantics. The meaning of a sentence is 
defined to be the application of its grammatical 
structure — ^represented in a type-logical model — to 
the kronecker product of the meanings of its 
words, as computed in a distributional model. 
The concrete and experimental consequences of 
this setting, and other models that aim to bring 
together the logical and distributional approaches, 



are active topics in current natural language seman- 



tics research, e.g. see ( [Grefenstette et al., 2011 



Grefenstette and Sadrzadeh, 201 It 



Clark et al., 2010t Baroni and ZampareUi, 2010 



Guevara, 20T0| [Mitchell and Lapata,l008) . 

In this paper, we focus on our recent concrete Dis- 
CoCat model ( [Grefenstette and Sadrzadeh, 201 1| ) 
and in particular on nouns composed with transitive 
verbs. Whereby the meaning of a transitive sentence 
'sub tverb obj' is obtained by taking the component- 
wise multiplication of the matrix of the verb with 
the kronecker product of the vectors of subject and 
object: 



sub tverb obj = tverb (sub (^ obj) (1) 

In most logical models, transitive verbs are modeled 
as relations; in the categorical model the relational 
nature of such verbs gets manifested in their ma- 
trix representation: if subject and object are each r- 
dimensional row vectors in some space N, the verb 
will be a r X r matrix in the space N (g) N. There 
are different ways of learning the weights of this ma- 
trix. In ( [Grefenstette and Sadrzadeh, 201 1| ), we de- 
veloped and implemented one such method on the 
data from the British National Corpus. The matrix of 
each verb was constructed by taking the sum of the 
kronecker products of all of the subject/object pairs 
linked to that verb in the corpus. We refer to this 
method as the indirect method. This is because the 
weight Cij is obtained from the weights of the sub- 
ject and object vectors (computed via co-occurrence 
with bases n j and n j respectively), rather than di- 
rectly from the context of the verb itself, as would 
be the case in lexical distributional models. This 



construction method was evaluated against an exten- 



sion of Mitchell and Lapata (2008 1's disambigua- 
tion task from intransitive to transitive sentences. 
We showed and discussed how and why our method, 
which is moreover scalable and respects the gram- 
matical structure of the sentence, resulted in better 
results than other known models of semantic vector 
composition. 

As a motivation for the current paper, note that 
there are at least two different factors at work in 
Equation ([T|): one is the matrix of the verb, denoted 
by tverb, and the other is the ki^onecker product of 
subject and object vectors sub obj. Our model's 
mathematical formulation of composition prohibits 
us from changing the latter kronecker product, but 
the 'content' of the verb matrices can be built 
through different procedures. 

In recent work we used a standai^d lexical distri- 
butional model for nouns and engineered our verbs 
to have a more sophisticated structure because of 
the higher dimensional space they occupy. In par- 
ticular, we argued that the resulting matrix of the 
verb should represent 'the extent according to which 
the verb has related the properties of subjects to the 
properties of its objects', developed a general proce- 
dure to build such matrices, then studied their em- 
pirical consequences. One question remained open: 
what would be the consequence of starting from the 
standard lexical vector of the verb, then encoding 
it into the higher dimensional space using different 
(possibly ad-hoc but nonetheless interesting) mathe- 
matically inspired methods. 

In a nutshell, the lexical vector of the verb is de- 

> 

noted by tverb and similar to vectors of subject and 

object, it is an r-dimensional row vector. Since the 
kronecker product of subject and object (sub (g) obj) 
is r X r, in order to make tverb applicable in Equa- 
tion [T] i.e. to be able to substitute it for tverb, we 
need to encode it into a r x r matrix in the N ^ N 
space. In what follows, we investigate the empirical 
consequences of three different encodings methods. 

2 From Vectors to Matrices 

In this section, we discuss three different ways of en- 
coding r dimensional lexical verb vectors into r x r 
verb matrices, and present empirical results for each. 
We use the additional structure that the kronecker 



product provides to represent the relational nature 
of transitive verbs. The results are an indication that 
the extra information contained in this larger space 
contributes to higher quality composition. 

One way to encode an r-dimensional vector as a 
r X r matrix is to embed it as the diagonal of that 
matrix. It remains open to decide what the non- 
diagonal values should be. We experimented with 
Os and Is as padding values. If the vector of the verb 
is [ci, C2, • • • , Cr] then for the case (referred to as 
0-diag) we obtain the following matrix: 





/ ci • • 


• ° ^ 


tverb = 


C2 •• 


• 



\ 



Cr I 



For the 1 case (refeiTcd to as 1-diag) we obtain the 
following matrix: 





/ci 


1 •• 


• 1 \ 




1 


C2 •• 


• 1 


tverb = 









V 1 1 



Cr I 



We also considered a third case where the vector is 
encoded into a matrix by taking the kronecker prod- 
uct of the verb vector with itself: 



tverb 



tverb ® tverb 



So for tverb 
ing matrix: 



tverb 



[Cl,C2,--- ,Cr 



we obtain the foUow- 



/ ClCi 

C2C1 

\ CrC\ 



C1C2 
C2C2 

CrC2 



C\Cr 
C2Cr 



/ 



3 Degrees of synonymity for sentences 

The degree of synonymity between meanings of 
two sentences is computed by measuring their ge- 
ometric distance. In this work, we used the co- 
sine measure. For two sentences 'subi tverbi obji' 
and 'sub2 tverb2 obJ2', this is obtained by taking 
the Frobenius inner product of subi tverbi obji and 
sub2 tverb2 obj 2. The use of Frobenius product 
rather than the dot product is because the calcula- 
tion in Equation ([U produces matrices rather than 



row vectors. We normalized the outputs by the mul- 
tipUcation of the lengths of their corresponding ma- 
trices. 

4 Experiment 

In this section, we describe the experiment 
used to evaluate and compare these three meth- 
ods. The experiment is on the dataset developed 



isfied the house' . 



Dataset The dataset is built using the same guide- 



in ( Grefenstette and Sadrzadeh, 2011 1. 



Parameters We used the parameters described by 



Mitchell and Lapata (2008 1 for the noun and verb 
vectors. All vectors were built from a lemmatised 
version of the BNC. The noun basis was the 2000 
most common context words, basis weights were 
the probability of context words given the target 
word divided by the overall probability of the con- 
text word. These features were chosen to enable 
easy comparison of our experimental results with 
those of Mitchell and Lapata's original experiment, 
in spite of the fact that there may be more sophisti- 
cated lexical distributional models available. 

Task This is an extension of 

Mitchell and Lapata (2008 1's disambiguation 

task from intransitive to transitive sentences. The 
general idea behind the transitive case (similar to 
the intransitive one) is as follows: meanings of 
ambiguous transitive verbs vary based on their 
subject-object context. For instance the verb 'meet' 
means 'satisfied' in the context 'the system met the 
criterion' and it means 'visit', in the context 'the 
child met the house'. Hence if we build meaning 
vectors for these sentences compositionally, the 
degrees of synonymity of the sentences can be used 
to disambiguate the meanings of the verbs in them. 
Suppose a verb has two meanings a and b and 
that it has occurred in two sentences. Then if in 
both of these sentences it has its meaning a, the two 
sentences will have a high degree of synonymity, 
whereas if in one sentence the verb has meaning a 
and in the other meaning b, the sentences will have 
a lower degree of synonymity. For instance 'the sys- 
tem met the criterion' and 'the system satisfied the 
criterion' have a high degree of semantic similarity, 
and similai^ly for 'the child met the house' and 'the 
child visited the house'. This degree decreases for 
the pair 'the child met the house' and 'the child sat- 



lines as [Mitchell and Lapata (2008] ), using transitive 
verbs obtained from CELEXl^ paired with subjects 
and objects. We first picked 10 transitive verbs 
from the most frequent verbs of the BNC. For each 
verb, two different non-overlapping meanings were 
retrieved, by using the JCN (Jiang Conrath) infor- 
mation content synonymity measure of WordNet to 
select maximally different synsets. For instance for 
'meet' we obtained 'visit' and 'satisfy'. For each 
original verb, ten sentences containing that verb with 
the same role were retrieved from the BNC. Exam- 
ples of such sentences are 'the system met the crite- 
rion' and 'the child met the house'. For each such 
sentence, we generated two other related sentences 
by substituting their verbs by each of their two syn- 
onyms. For instance, we obtained 'the system sat- 
isfied the criterion' and 'the system visited the cri- 
terion' for the first meaning and 'the child satisfied 
the house' and 'the child visited the house' for the 
second meaning . This procedure provided us with 
200 pairs of sentences. 

The dataset was split into four non-identical sec- 
tions of 100 entries such that each sentence appears 
in exactly two sections. Each section was given to 
a group of evaluators who were asked to assign a 
similarity score to simple transitive sentence pairs 
formed from the verb, subject, and object provided 
in each entry {e.g. 'the system met the criterion' 
from 'system meet criterion'). The scoring scale for 
human judgement was [1, 7], where 1 was most dis- 
similar and 7 most identical. 

Separately from the group annotation, each pair in 
the dataset was given the additional arbitrary classi- 
fication of HIGH or LOW similarity by the authors. 

Evaluation Metiiod To evaluate our methods, we 
first applied our formulae to compute the similar- 
ity of each phrase pair on a scale of [0, 1] and then 
compared it with human judgement of the same 
pair. The comparison was performed by measuring 
Speai^man's p, a rank correlation coefficient ranging 
from -1 to 1. This provided us with the degree of 
correlation between the similarities as computed by 
our model and as judged by human evaluators. 
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Following Mitchell and Lapata (2008 1, we also 
computed the mean of HIGH and LOW scores. 
However, these scores were solely based on the au- 
thors' personal judgements and as such (and on their 
own) do not provide a very reliable measure. There- 
fore, like Mitchell and Lapata (2008[ ), the models 
were ultimately judged by Spearman's p. 

The results are presented in table |4] The additive 
and multiplicative rows have, as composition oper- 
ation, vector addition and component-wise multipli- 
cation. The Baseline is from a non-compositional 
approach; it is obtained by comparing the verb vec- 
tors of each pair directly and ignoring their subjects 
and objects. The UpperBound is set to be inter- 
annotator agreement. 



trices for the meaning of a transitive sentence: 



Model 


High 


Low 


P 


Basehne 


0.47 


0.44 


0.16 


Add 
Multiply 


0.90 
0.67 


0.90 
0.59 


0.05 
0.17 


Categorical 




Indirect matrix 


0.73 


0.72 


0.21 


0-diag matrix 


0.67 


0.59 


0.17 


1-diag matrix 


0.86 


0.85 


0.08 


V ®v matrix 


0.34 


0.26 


0.28 


UpperBound 


4.80 


2.49 


0.62 



Table 1 : Results of compositional disambiguation. 

The indirect matrix performed better than the 
vectors encoded in diagonal matrices padded with 
and 1. However, surprisingly, the kronecker prod- 
uct of this vector with itself provided better results 
than all the above. The results were statistically sig- 
nificant withp < 0.05. 



5 Analysis of the Results 

Suppose the vector of subject is [si, S2 
the vector of object is obj = [oi, 02, • 
the matrix of sub (g) obj is: 

I SiOi S1O2 ■ ■ ■ SiOr \ 
S2O1 S2O2 ■ ■ ■ S20r 



■ , Sr] and 
,0r], then 



y Sj-Oi Sr02 • • • SrOr / 

After computing Equation ([T]) for each generation 
method of tverb, we obtain the following three ma- 



0-diag : 



/ CiSiOi 





V 




C2S202 










J 



This method discards all of the non-diagonal infor- 
mation about the subject and object, for example 
there is no occurrence of S1O2, S2O1, etc. 

/ CiSiOi S1O2 ■ ■ ■ SiOr \ 

S2O1 C2S2O2 ■ ■ ■ S20r 

1-diag: 

y Sj-Oi Sr02 ■ ■ ■ CrSj-Or / 

This method conserves the information about the 
subject and object, but only applies the information 
of the verb to the diagonals: si and 02, S2 and oi, 
etc. are never related to each other via the verb. 

/ CiCiSiOi C1C2S1O2 • • • CiCrSiOr \ 
C2C1S2O1 C2C2S2O2 • • • C2CrS20, 

V 0v: 

\ CrCiSrOi CrC2Sr02 ' ' ' CrCrSrOr ) 

This method not only conserves the information 
of the subject and object, but also applies to them 
all of the information encoded in the verb. These 
data propagate to Frobenius products when comput- 
ing the semantic similarity of sentences and justify 
the empirical results. 

The unexpectedly good performance of the v iSi v 
matrix relative to the more complex indirect method 
is surprising, and certainly demands further inves- 
tigation. What is sure is that they each draw upon 
different aspects of semantic composition to provide 
better results. There is certainly room for improve- 
ment and empirical optimisation in both of these 
relation-matrix construction methods. 

Furthermore, the success of both of these meth- 
ods relative to the others examined in Table [T] shows 
that it is the extra information provided in the 
matrix (rather than just the diagonal, representing 
the lexical vector) that encodes the relational na- 
ture of transitive verbs, thereby validating in part 
the requirement suggested in [Coecke et al. (2010] ) 
and Grefenstette and Sadrzadeh (2011| ) that rela- 
tional word vectors live in a space the dimensionality 
of which be a function of the arity of the relation. 
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